Skip to content

feature: improve provenance and make q2-preview editable#231

Draft
gordonwoodhull wants to merge 104 commits into
mainfrom
feature/provenance
Draft

feature: improve provenance and make q2-preview editable#231
gordonwoodhull wants to merge 104 commits into
mainfrom
feature/provenance

Conversation

@gordonwoodhull
Copy link
Copy Markdown
Member

@gordonwoodhull gordonwoodhull commented May 22, 2026

Draft PR for CI.

Provenance epic plans 3-7 are complete; provenance data is flowing and impossible edits to atomic elements are both blocked on the front end and soft-dropped by the incremental writer.

Next up:

  • Plans 7a-7c: more testing and a soft drop warning for attempting to edit before first render
  • Plan 8: use a custom node for include shortcode
  • Plan 9: expose provenance of YAML values used in transforms/meta shortcode
  • Plan 10: consistent provenance of Lua-produced content, so that you could eg cmd-click some content and go to the line of Lua code that produced it.

gordonwoodhull added a commit that referenced this pull request May 25, 2026
The hub-client-e2e.yml `paths:` filter only fires the workflow when a
commit touches `hub-client/**` or the workflow file itself. It does not
follow transitive Rust deps, so PRs that modify upstream crates the WASM
bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`,
`pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently
skip e2e.

Two recent misses:

- f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and
  `pollster::block_on` introduced in `quarto-core` broke 8 hub-client
  WASM tests on main. e2e never ran because the change was under
  `crates/`, not `hub-client/`.
- PR #231 (feature/provenance, this branch): 57 files modified across
  `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently
  skipped on every push despite the PR materially changing the WASM
  bundle's behavior.

Fix: drop the `paths:` filter outright and match the trigger shape of
the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`).
Also adds a `concurrency:` block (lifted from `test-suite.yml`) so
superseded runs on a PR get cancelled in flight — keeps the runner
cost from compounding.

Closes bd-izh3. The original ask there was to add a PR trigger with a
*broader* path filter; that approach still wouldn't catch the upstream-
crate case, so we go the coarser route the issue's spirit calls for.
The runner-sizing open question in bd-izh3 is also resolved — ae8274a
confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the
full suite in 5.3-8.1 min.

`kyoto` deliberately omitted from the branch list: `origin/kyoto` last
moved 2026-02-02 and is 825 commits behind main; the sibling workflows
still reference it but that's cargo-cult.
gordonwoodhull added a commit that referenced this pull request May 25, 2026
…nce)

bd-izh3 closed by 016894a on feature/provenance (PR #231). The patch
drops the hub-client-e2e.yml path filter outright so the workflow fires
on every PR like the sibling heavy workflows — strictly broader than
the original 'add PR trigger with broader filter' proposal, since path
filters can never follow transitive Rust deps.

Incidental: bd-cxara has its 'source_repo_path' field stripped (was a
stale absolute path from shikokuchuo's local clone; harmless flush).
gordonwoodhull added a commit that referenced this pull request May 25, 2026
The hub-client-e2e.yml `paths:` filter only fires the workflow when a
commit touches `hub-client/**` or the workflow file itself. It does not
follow transitive Rust deps, so PRs that modify upstream crates the WASM
bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`,
`pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently
skip e2e.

Two recent misses:

- f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and
  `pollster::block_on` introduced in `quarto-core` broke 8 hub-client
  WASM tests on main. e2e never ran because the change was under
  `crates/`, not `hub-client/`.
- PR #231 (feature/provenance, this branch): 57 files modified across
  `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently
  skipped on every push despite the PR materially changing the WASM
  bundle's behavior.

Fix: drop the `paths:` filter outright and match the trigger shape of
the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`).
Also adds a `concurrency:` block (lifted from `test-suite.yml`) so
superseded runs on a PR get cancelled in flight — keeps the runner
cost from compounding.

Closes bd-izh3. The original ask there was to add a PR trigger with a
*broader* path filter; that approach still wouldn't catch the upstream-
crate case, so we go the coarser route the issue's spirit calls for.
The runner-sizing open question in bd-izh3 is also resolved — ae8274a
confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the
full suite in 5.3-8.1 min.

`kyoto` deliberately omitted from the branch list: `origin/kyoto` last
moved 2026-02-02 and is 825 commits behind main; the sibling workflows
still reference it but that's cargo-cult.
gordonwoodhull added a commit that referenced this pull request Jun 1, 2026
The hub-client-e2e.yml `paths:` filter only fires the workflow when a
commit touches `hub-client/**` or the workflow file itself. It does not
follow transitive Rust deps, so PRs that modify upstream crates the WASM
bundle depends on — `quarto-core`, `quarto-pandoc-types`, `quarto-source-map`,
`pampa`, `quarto-ast-reconcile`, `wasm-quarto-hub-client`, etc. — silently
skip e2e.

Two recent misses:

- f96f56d (Carlos, 5/22): WASM-incompatible `Instant::now()` and
  `pollster::block_on` introduced in `quarto-core` broke 8 hub-client
  WASM tests on main. e2e never ran because the change was under
  `crates/`, not `hub-client/`.
- PR #231 (feature/provenance, this branch): 57 files modified across
  `crates/` and `ts-packages/`, zero under `hub-client/`. e2e silently
  skipped on every push despite the PR materially changing the WASM
  bundle's behavior.

Fix: drop the `paths:` filter outright and match the trigger shape of
the sibling heavy workflows (`test-suite.yml`, `ts-test-suite.yml`).
Also adds a `concurrency:` block (lifted from `test-suite.yml`) so
superseded runs on a PR get cancelled in flight — keeps the runner
cost from compounding.

Closes bd-izh3. The original ask there was to add a PR trigger with a
*broader* path filter; that approach still wouldn't catch the upstream-
crate case, so we go the coarser route the issue's spirit calls for.
The runner-sizing open question in bd-izh3 is also resolved — ae8274a
confirmed `ubuntu-latest` (2 cores, 2 Playwright workers) handles the
full suite in 5.3-8.1 min.

`kyoto` deliberately omitted from the branch list: `origin/kyoto` last
moved 2026-02-02 and is 825 commits behind main; the sibling workflows
still reference it but that's cargo-cult.
Audit and revise Plans 3-8 of the q2-preview series (now framed
internally as the provenance epic) after a design discussion that
followed the q2-preview pipeline and attribution work landing on main.

Major design changes folded into the plans:

- **Plan 4 unified Generated variant.** Collapse the earlier
  `Synthetic` + `Derived` split into one `Generated { by, anchors: Vec<Anchor> }`
  shape. Atomicity is per-`by.kind` (orthogonal to anchors); the
  invocation source byte range is the first anchor with role
  `AnchorRole::Invocation`. One wire-format code (4) instead of two.

- **Plan 4/5/6 typed anchors (Path C).** Instead of stuffing
  source-info chain metadata into `by.data` (dynamic JSON), the chain
  is a typed `Vec<Anchor>` where each `Anchor` carries an `Arc<SourceInfo>`
  and a role-labeled `AnchorRole` (`Invocation`, `ValueSource`,
  `Other(String)`). `by.data` shrinks to per-kind non-source-info
  configuration. Two future-anchor roles flagged as follow-ups
  contingent on metadata-loader and Lua-file-registration work.

- **Plan 6 uniform shortcode anchor stamping.** Single funnel covers
  Rust built-ins, Lua-loaded extension handlers, and user-extension
  shortcodes uniformly via a post-walk `stamp_shortcode_anchors` helper.
  Enrichment-via-post-walk preserves Lua-attached `by.data` fields
  (lua_path, lua_line) while promoting `by.kind` to `shortcode`.
  Attribution interaction documented: multi-author shortcodes get
  latest-wins via the existing `query_byte_range` max-time logic
  composed with chain-walking through the `Invocation` anchor.

- **Plan 5 latent code-3 bug now reachable.** Plans 1-2 shipped the
  q2-preview pipeline that runs filters whose output crosses the JSON
  boundary; the FilterProvenance code-3 round-trip bug is no longer
  latent in production. Added end-to-end production-reachability
  regression test using the `{{< kbd Ctrl+C >}}` fixture (kbd.lua
  constructs a Span that gets FilterProvenance-tagged and then
  shortcode-stamped). Drops code 5 from the design.

- **Plan 7 SPA edit-back in scope.** The new q2 preview CLI command
  serves a separate SPA from ts-packages/preview-renderer; both
  hub-client and the SPA share the writer machinery via @quarto/preview-runtime.
  Plan 7 now covers replacing `noopSetAst` in the SPA with a real
  handler that routes through `incrementalWriteQmd` to
  `syncClient.updateFileContent` and the ephemeral hub's automerge↔disk
  bridge. Adds a small SPA-local `DiagnosticStrip` for Q-3-42/Q-3-43;
  hub-client's existing diagnostics-banner handles the same warnings
  there. Single-file mode (bd-tnm3k) works through the same automerge
  stack — no special case.

- **Plan 8 wrapper stays Original.** Explicit reasoning added for
  why `CustomNode("IncludeExpansion")` uses Original source_info
  (CustomNode.type_name carries generator identity; the wrapper
  substitutes 1:1 for the source-mapped Paragraph). HTML pipeline
  resolve transform in the Normalization Phase (symmetric with
  CalloutResolveTransform); HTML doesn't attribute the include line
  because there's no DOM anchor for it — accepted v1 behavior.

Mechanical changes also folded in:

- Rename `Synthetic` → `Generated` throughout the type vocabulary in
  all plans.
- Update JS-side hand-mirror file paths (`hub-client/src/utils/...`
  → `ts-packages/preview-renderer/src/utils/...`) to reflect the
  Phase-D package split.
- Each plan's intro reframed as part of the provenance epic; file
  names keep the q2-preview-plan-N form for continuity.

File renames for clarity about which filters each plan covers:

- `…plan-3-filter-idempotence.md` → `…plan-3-builtin-filter-idempotence.md`
- `…plan-7a-filter-idempotence.md` → `…plan-7a-user-filter-idempotence.md`

Plans 3-8 remain in design state on this branch; no code changes yet.
Audit pass over the provenance epic's idempotence story, scoping Plan 3
to pipeline non-determinism only and propagating the consequences to the
neighbouring plans.

Plan 3 (builtin transform and filter idempotence):

- Retitle to "Built-in transform and filter idempotence verification" —
  symmetric across Rust transforms and Lua filters (prior framing was
  too narrow).
- Enumerate the actual universe under test: 36 Rust transforms in
  build_q2_preview_transform_pipeline (4 excluded, named with reasons),
  ~20 stage-level items in build_q2_preview_pipeline_stages, and the
  one Lua filter under resources/extensions/ (video-filter.lua). The
  prior "~10-20 filters" estimate misread shortcodes as filters.
- Drop the "Plan 3 strengthening" round-trip amendment that was added
  alongside Plan 7a in commit 2129d35. Round-trip non-idempotence is
  not exercised by today's pipeline; CI-time round-trip testing
  conflates writer-lossiness with filter-non-idempotence; 7a's runtime
  check is the better home for the property when Plan 7's writer
  ships. Trim "Two flavors" section to a pointer at 7a.
- Add compute_meta_hash_fresh / compute_meta_hash_fresh_excluding_rendered
  as a new helper in quarto-ast-reconcile, parallel to the existing
  block hasher. Hash covers blocks + meta (excluding rendered.*).
- Rewrite test pseudocode against the real run_pipeline API at
  pipeline.rs:626.
- Add fixture-format constraint: no executable engine cells (CI has
  no kernels).
- Coverage gap audit: ~25 fixtures across the document-level, Lua
  shortcode, website-project, attribution, and resource categories.
  Includes lua-shortcode-version, lua-shortcode-lipsum-fixed (non-random
  path), and video-filter-header for the one built-in Lua filter.
- Convert to a development-plan format with a seven-phase work-items
  checklist.
- Close the engine-staleness open question via filter.rs:158 (fresh
  Lua::new() per invocation).
- Clarify the lua-filter-pipeline reference as TypeScript Quarto
  porting material, not the Rust inventory.

Plan 6 (provenance audit):

- Add a §Test plan bullet for source_info determinism: Plan 3's hashes
  exclude source_info by design, so a per-fixture source_info-equality
  check is Plan 6's own responsibility.

Plan 7 (incremental writer):

- Add a writer-lossless baseline test as the first §Test plan bullet,
  prerequisite for the reconciler tests. Reuses Plan 3's fixture set.
- Add Plan 3 to §References and §Dependencies (soft-depends-on via
  compute_meta_hash_fresh).

Plan 7a (runtime user-filter idempotence):

- Remove all references to the now-deleted "Plan 3 strengthening"
  section (five locations including a full subsection).
- Reframe the out-of-scope bullet from "Strengthening Plan 3" to
  "Extending the runtime round-trip check to built-in filters," with
  three-point v1-acceptance reasoning in §Notes.
- Update §Design decisions, §Dependencies, and §References to reflect
  the new shape and the shared compute_meta_hash_fresh helper.
- Add the meta-hash comparison to step 4 of the round-trip check.

No code changes; design state only.
…ailure policy

Hash helper: `merge_op` participates (verified `MergeOp::default() =
Concat` is a stable compile-time constant); `Map` entries hashed in
insertion order, no sort (an idempotence test should *catch* the kind
of HashMap-iteration-order non-determinism a sort would mask). Adds
regression-guard unit tests for both choices.

Test runner: drives every fixture through both `DriveMode::SingleFile`
(direct `run_pipeline`) and `DriveMode::ProjectOrchestrator`
(`ProjectPipeline<RenderToPreviewAstRenderer>`) so orchestrator-only
non-determinism (project discovery, ProjectIndex assembly, file-iteration
order) is also under test. Website/chrome fixtures are
orchestrator-only by design.

Failure policy: failing fixtures stay **failing** — no auto-`#[ignore]`.
Each failure files a beads issue whose description doubles as a
sub-agent investigation prompt. The integration branch holds the
queue; merge to main waits until drained or the user explicitly opts
to ignore.

New helper `find_first_divergence` (alongside the hashers) returns
`DivergencePoint::{Block { index }, MetaKey { path }, None}` so the
test driver's panic message — and therefore the sub-agent prompt —
arrives with a concrete starting point instead of just "hash diverged."

Orchestrator-mode `DocumentAst` extraction: researched the data flow;
the typed AST is materialized inside `render_qmd_to_preview_ast` but
discarded after JSON serialization. Plan recommends adding `pub ast:
DocumentAst` to `PreviewAstOutput` and forwarding through
`WasmPassTwoOutput`; alternatives (JSON re-parse, test-only hook)
documented with their costs.

Fixture rules: no absolute process paths in fixture content (built-in
extensions extract to a `temp_dir` whose path differs across CI runs;
stable within a single process — fine for two-runs-compare, but a
latent issue for future stored-snapshot variants).

Smaller corrections: `Format::from_format_string("q2-preview")` (no
`Format::q2_preview()` constructor exists); `apply_lua_filter`
(singular) is the per-filter Lua-state-creation site, with the plural
loop calling it once per filter; `LuaShortcodeEngine::new` is the
shortcode-side analogue; `quarto/video` filter extension is built-in
via `include_dir!(resources/extensions)` and auto-discovered by
`StageContext::new`, so fixtures need no scaffolding beyond `filters:
[video]` in YAML; `meta.rendered.includes.*` is the actual path
(not `meta.includes.*`) and includes contributions from
`IncludeResolveStage`, chrome render transforms, `attribution_viewer`,
and Bootstrap/clipboard injection — all skipped by
`compute_meta_hash_fresh_excluding_rendered`.

Stage-inventory clarifications: `MathJsStage` is excluded from
q2-preview; `BootstrapJsStage` and `ClipboardJsStage` write only to
`ctx.artifacts` (not to `meta` or `blocks`), so they don't affect the
hash — but their q2-preview inclusion is questionable and is filed
separately as bd-2ag1c.

Notes for the next traversal: `CodeHighlightStage`'s native disk scan
for user grammars is OS-order-dependent (not exercised today;
fixtures don't supply user grammars); lipsum's module-load
`math.randomseed(os.time())` is harmless on the non-random code path
the fixture exercises but should be reverified if a future variant
routes through `math.random`.

Estimated scope: ~760 → ~980 lines.
…branch policy

Audit pass against current source. Settles every open question that
remained in the prior revision and corrects factual drift.

Reuse over rebuild
- `DriveMode::ProjectOrchestrator` now delegates to the existing
  `render_active_page_preview` helper at
  `crates/quarto-core/tests/render_page_in_project.rs:660`. No fresh
  orchestrator wiring; no `make_website_project_ctx(...)` builder.
- `DocumentAst` extraction settled on option (a): re-parse the JSON
  via `pampa::readers::json::read`. source_info round-trips but the
  hash excludes it, so no stripping pass and no production plumbing
  change is required. Earlier option (b) (typed-AST plumbing through
  `PreviewAstOutput` / `WasmPassTwoOutput`) abandoned.
- `run_orchestrator` code sample updated: real body in place of the
  prior `unimplemented!("see Open questions")` stub.

Test crate location pinned
- File: `crates/quarto-core/tests/idempotence.rs`.
- Fixtures: `crates/quarto-core/tests/fixtures/idempotence/`.
- Cargo invocation in the sub-agent prompt template updated to
  `--test idempotence`.

Long-lived branch policy made explicit
- New `## Long-lived branch policy` section at the top.
- `## Goal` clarifies that "CI-enforced" applies when the plan lands
  on `main`; until then `feature/provenance` is allowed to be red
  while the failure queue drains.
- `### Phase 5 — Failure triage` opens with the same constraint.

Factual fixes against current source
- Transform count corrected from 36 to 37; missing
  `table-bootstrap-class` added to Finalization, with a fixture
  entry in the gap audit and Phase 4 checklist.
- `Q2_PREVIEW_STAGE_EXCLUDED` corrected to list all three exclusions
  (`math-js`, `render-html-body`, `apply-template`).
- `CodeHighlightStage` user-grammar scan citation moved from
  `pipeline.rs:644-650` to
  `crates/quarto-core/src/transforms/code_highlight.rs:126-129`.
- Stale line numbers refreshed throughout (pipeline.rs 1181→1198,
  1220→1237, 379→380, 355→356, 626→627, 855→859, 663→664;
  render_page_in_project.rs 653→660; Pass2Payload::AstJson 256→254;
  stage/context.rs 220→221; ShortcodeResolveTransform::transform
  257→513 with the correct file path).
- bd-2ag1c ordering pinned: Plan 3 lands first; bd-2ag1c follows
  with Plan 3's measurements in hand.

Section rename: "Open questions for implementation" →
"Decisions (was: open questions)" + a `### CI failure policy &
sub-agent prompt template` subsection. All internal cross-refs
updated.

Estimate revised
- Scaffolding line item: ~260 → ~100 lines (reuse, not rebuild).
- `PreviewAstOutput::ast` plumbing (~20 lines) removed entirely.
- Total: ~980 → ~800 lines.
- Session count revised 2 → 2-3 with the third explicitly allocated
  to Phase 5 triage.
Adds the structural-hash infrastructure that Plan 3's q2-preview
idempotence gate (and Plan 7a's runtime user-filter check) will sit on:

- compute_meta_hash_fresh: source-info-agnostic ConfigValue hasher.
  Insertion-order Map keys (no sort, so HashMap-iteration-order bugs
  in transforms remain detectable). MergeOp participates via its
  enum discriminant. Recurses into PandocInlines/PandocBlocks via
  the existing inline/block hashers (which already exclude
  source_info).
- compute_meta_hash_fresh_excluding_rendered: same, but skips the
  top-level `rendered` map entry. The exclusion is intentionally
  not propagated into recursion: a nested `rendered` key is content.
- find_first_divergence + DivergencePoint: returns the first block
  index whose per-block fresh hash differs, or the first insertion-
  order meta key path whose subtree hash differs (with the same
  rendered.* exclusion). The plan-sketch signature took
  &DocumentAst, but quarto-ast-reconcile cannot depend on
  quarto-core; the helper takes &[Block] + &ConfigValue and the
  test driver projects from DocumentAst.
- 11 new unit tests cover: same/different content, source_info/
  key_source agnosticism, top-level rendered exclusion, nested
  rendered participation, Map insertion-order sensitivity (no-sort
  regression guard), MergeOp sensitivity; identical/Block-mismatch/
  MetaKey-path/rendered-skip divergence localization.

Verification: `cargo nextest run --workspace` — 9321 passed, 196
skipped. `cargo xtask verify --skip-hub-build` steps 1–5 green
(lint, fmt, Rust build with -D warnings, tree-sitter, Rust tests
with -D warnings). Steps 7/10 fail with the known --skip-hub-build
artifact (`wasm-quarto-hub-client` unbuilt), unrelated to these
additive Rust changes.

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
Adds the test driver that Phases 3-4 will hang ~25 fixtures off.
Self-contained at `crates/quarto-core/tests/idempotence.rs`.

- `DriveMode { SingleFile, ProjectOrchestrator }`. Single-file calls
  `run_pipeline` with `build_q2_preview_pipeline_stages`. Orchestrator
  drives `ProjectPipeline<RenderToPreviewAstRenderer>` via the existing
  `render_active_page_preview` body (copied inline because each
  `tests/*.rs` is its own binary).
- `Fixture { name, setup, active, modes }` + `run_fixture` runs the
  pipeline twice per (fixture, mode), hashes blocks via
  `compute_blocks_hash_fresh` and meta via
  `compute_meta_hash_fresh_excluding_rendered`, and on divergence
  panics with `find_first_divergence`'s `DivergencePoint` embedded so
  the panic message itself fills the plan's sub-agent investigation
  prompt template.
- `pandoc_to_document_ast` is the small field-shuffle that the plan
  identifies: orchestrator mode emits `Pass2Payload::AstJson`, which
  `pampa::readers::json::read` re-parses into `(Pandoc, ASTContext)`;
  the hasher only reads `ast.blocks` + `ast.meta` so the other
  `DocumentAst` fields get defaults.
- `tests/fixtures/idempotence/README.md` documents the fixture-format
  rules (no engine cells, no absolute paths, per-fixture mode mapping).
- `smoke_plain_paragraph` smoke fixture drives a single-paragraph
  document through both modes. Passing this proves the harness works
  end-to-end before Phases 3-4 land the real fixtures.

Verification: `cargo nextest run -p quarto-core --test idempotence`
runs the new smoke test (PASS). `cargo xtask verify
--skip-hub-build --skip-hub-tests` steps 1-9 green; the Phase-1
idempotence tests and this Phase-2 smoke test ran inside Step 5.
Step 10 (preview-renderer integration tests in
`ts-packages/preview-renderer/`) fails with the same WASM-import
artifact as Step 7 — both depend on `wasm-quarto-hub-client` which
`--skip-hub-build` skips. Unrelated to these Rust-only additions.

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
Adds the existing-fixture batch the plan calls "carry-forward from
prior plan draft": one fixture per Rust transform / feature that was
already exercised in earlier idempotence drafts, scoped to single-file
document fixtures that run in both DriveMode variants.

Coverage:
- meta-single, meta-markdown — shortcode-resolve + metadata-normalize
  (string and PandocInlines branches).
- include-trivial — include-expansion stage + shortcode-resolve.
- callout-warning — CalloutTransform (callout-resolve is excluded
  from q2-preview, so the CustomNode survives).
- theorem — TheoremSugarTransform.
- figure-ref-target — FloatRefTargetSugarTransform.
- crossref-to-theorem — crossref-index + crossref-resolve.
- sectionize-multi — SectionizeTransform across nested headers.
- footnotes-mixed — FootnotesTransform on inline + reference forms.
- appendix-license — AppendixStructureTransform with license/
  copyright meta and a footnote interaction.
- combined-stress — sectionize + callouts + shortcodes interacting.

A `doc_fixture(name, content)` helper collapses each single-file
fixture to a one-liner; `include-trivial` keeps an inline closure
because it writes two files.

All 12 idempotence tests (smoke + 11 new) pass:
  `cargo nextest run -p quarto-core --test idempotence` → 12 passed.

No queue entries for Phase 5 from this batch — the carry-forward
fixtures are all clean on first run.

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
npm install (from repo root) and npm run build:wasm (from hub-client)
updated package-lock.json and crates/wasm-quarto-hub-client/Cargo.lock
on this branch. Committed so subsequent fresh checkouts of
feature/provenance can build WASM from the same dependency set.
Adds the batch of Phase-4 fixtures that need no scaffolding beyond a
single-file `setup`. Per the long-lived-integration-branch policy,
fixtures that surface non-idempotence stay in the suite as the
triage queue.

Pass on first run (both DriveModes):
- code-block-fenced — code-block-generate / -render / code-highlight.
- proof — ProofSugarTransform.
- equation-labeled — EquationLabelTransform + crossref-resolve (eq).
- toc-on — toc-generate, toc-render.
- video-filter-header — built-in Lua filter under
  `resources/extensions/quarto/video/`.
- theme-bootstrap — compile-theme-css stage.
- table-bootstrap-class — TableBootstrapClassTransform.
- lua-shortcode-version — Lua-loaded shortcode handler (returns
  `quarto.version`).

In the queue:
- **lua-shortcode-lipsum-fixed**: `SingleFile` passes; the pipeline
  itself is idempotent. `ProjectOrchestrator` panics with
  `MalformedSourceInfoPool` re-parsing the AST JSON the orchestrator
  emitted. This is a JSON writer/reader round-trip bug specific to
  lipsum-shortcode-generated inlines, not a transform-determinism
  finding. Filed as **bd-3odjm**. The test stays red per the plan's
  "do not #[ignore]" rule; the integration branch is allowed to
  carry the failure until the queue is drained.

Verification: `cargo nextest run -p quarto-core --test idempotence`
→ 20 passed, 1 failed (bd-3odjm). Plan-1 unit tests and Phase-3
fixtures all green.

Refs:
- claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
- bd-3odjm
Both pass on first run in both DriveMode variants.

- include-in-header writes a tiny header.html and references it
  from front matter; exercises IncludeResolveStage.
- resource-image writes a 67-byte minimal PNG and references it via
  inline image syntax; exercises ResourceCollectorTransform.

Adds a write_bytes helper for the binary stub. Per the fixtures
README rule the PNG sits at the project root and is referenced
relatively (`./local.png`).

Verification: `cargo nextest run -p quarto-core --test idempotence`
→ 22 passed, 1 failed (bd-3odjm).
Three orchestrator-only website fixtures. Two pass, one in queue.

Pass:
- website-chrome — navbar + sidebar + page-navigation + page-footer
  + favicon + bootstrap-icons + canonical-url + title-prefix. Two
  pages (index, other), tiny favicon stub.
- website-listing — listing with categories enabled and feed: true,
  two posts under posts/, each with categories. Exercises
  listing-generate / -render, categories-sidebar, listing-feed-link,
  listing-feed-stage, listing-item-info.

In the queue:
- website-links — internal cross-page `.qmd` body links. Filed as
  bd-rz2we. Block 0 hash diverges across runs while meta hash is
  stable, so the divergence is genuinely in the AST blocks (not in
  rendered chrome). Hypothesis: link-rewrite or link-resolution is
  capturing the absolute project root (or canonicalized tempdir
  path) into the AST when it should emit a path-independent
  relative URL.

Verification: `cargo nextest run -p quarto-core --test idempotence`
→ 24 passed, 2 failed (bd-3odjm, bd-rz2we).

Refs:
- claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
- bd-rz2we
Extends Fixture with an optional attribution_json: Option<&'static str>.
When present:
- SingleFile installs PreBuiltAttributionProvider on
  RenderContext.attribution_provider before run_pipeline.
- ProjectOrchestrator forwards the JSON via
  RenderToPreviewAstRenderer::with_attribution; the renderer
  installs the same provider type on the per-page RenderContext it
  constructs internally.

Stub JSON has one actor + one run covering bytes 0..1024 (a wider
range than the fixture body actually uses) so the attribution map
overlaps the entire document and AttributionGenerateStage +
AttributionRenderTransform have something to write into the AST.

`cargo nextest run -p quarto-core --test idempotence` → 25 passed,
2 failed (bd-3odjm, bd-rz2we — both pre-existing). attribution_basic
passes on first run in both DriveModes, so the deterministic
provider + generate + render stack is genuinely idempotent.

This completes the Phase 4 fixture set. The Plan-3 gate now covers:
- 1 smoke fixture
- 11 carry-forward (Phase 3, all green)
- 9 Phase-4a doc fixtures (8 green, 1 in queue)
- 2 Phase-4b multi-file (both green)
- 3 Phase-4c website (2 green, 1 in queue)
- 1 Phase-4d attribution (green)

Total: 27 fixtures, 25 green, 2 in queue.

Refs:
- claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
- bd-3odjm (Plan 5 will fix), bd-rz2we
Adds claude-notes/instructions/idempotence-contract.md — the
author-facing summary of the contract Plan 3 enforces. Covers:
- what the hash includes and excludes (source-info blind,
  insertion-order maps, merge_op participates, rendered.* excluded
  at top level only);
- what new transforms must NOT do (undefined iteration order,
  process-local state, absolute paths, engine cells);
- the fresh-Lua-state-per-run rule for Lua filters / shortcodes;
- how to add a fixture (doc_fixture for trivial, inline closure for
  multi-file, ORCHESTRATOR_ONLY for chrome, attribution_json for
  attribution exercises);
- the long-lived-integration-branch policy: don't #[ignore] a
  failing fixture without explicit user approval.

Cross-linked from:
- crates/quarto-core/tests/fixtures/idempotence/README.md
  (existing pointer expanded to point at the contract doc and the
  plan).
- claude-notes/plans/2026-05-04-q2-preview-plan-7a-user-filter-idempotence.md
  (References section — authors looking at the runtime user-filter
  check find the CI contract too).

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
cargo nextest run --workspace: 9346/9348 pass. The 2 failures are
the documented queue items (bd-3odjm, bd-rz2we); every other
workspace test is green, including the 25 passing idempotence
fixtures.

cargo xtask verify (full WASM stack): Steps 1-4 green; Step 5
fails on the same 2 fixtures. That's the expected long-lived-
integration-branch state per the plan's §Long-lived branch policy —
the gate is allowed to be red until the queue is drained.

Plan 3 is complete as a deliverable: gate + hashing infrastructure
+ 27 fixtures + author-facing docs + filed queue. Merge to main
gated on draining the queue (bd-3odjm via Plan 5; bd-rz2we via a
follow-up).

Refs: claude-notes/plans/2026-05-04-q2-preview-plan-3-builtin-filter-idempotence.md
The Work-items section under Phase 1-7 was fully checked, but the
parallel "Coverage gaps to address during implementation" inventory
(per-fixture bullets, line ~560+) still showed unchecked boxes even
though every fixture in that list now ships in idempotence.rs.

Marked all 26 inventory items as landed. Annotated the two that are
in the Phase-5 triage queue (lipsum-fixed → bd-3odjm, website-links
→ bd-rz2we) so the queue state is also visible from the inventory,
not just from the Phase-5 work-items block.

Plan checklist is now fully consistent: 54 checked, 0 unchecked.
…erContext

Plan 3's website_links fixture was non-idempotent: rendered AST link
URLs captured the absolute tempdir path of the per-run TempDir,
causing block-0 hash divergence across two runs with different
tempdirs. Root cause: `ResourceResolverContext::vfs_root_mode`
played two roles via a single PathBuf — disk-write root (where
runtime.file_write puts theme CSS / copied resources) and URL
prefix (what gets embedded in HTML link/asset URLs). In production
WASM these are intentionally identical; on native they have to
diverge so writes hit a real tempdir but URLs stay path-independent.

Split the field into `{ write_root, url_root }` and add a two-arg
`vfs_root_with_url_root` constructor plus per-renderer
`with_url_root` builder. Single-arg `vfs_root(...)` constructor
preserves the WASM identity contract by construction (write_root ==
url_root). Native test helpers in tests/idempotence.rs and
tests/render_page_in_project.rs now pass
`.with_url_root("/.quarto/project-artifacts")`, so rendered URLs
embed the synthetic prefix while disk writes still land in the
tempdir.

website_links now passes; 25/26 idempotence fixtures pass. The
remaining lipsum failure is bd-3odjm (FilterProvenance wire
format), owned by Plan 5 and out of scope here. Workspace nextest:
9347/9348. cargo xtask verify (Rust leg) clean for lint/fmt/build
with -D warnings.

Plan: claude-notes/plans/2026-05-21-vfs-url-write-root-split.md
Plan 4 (SourceInfo provenance types) finalized for development:
- 7-phase work-items checklist (types → constructors → accessor updates
  → Lua serde → migration → tests → verification gate)
- field renamed `anchors` → `from` (typed `SmallVec<[Anchor; 1]>` from
  day 1; serde feature required on smallvec)
- accessor semantics for `Generated` pinned: length/start_offset/
  end_offset → 0, map_offset → None, resolve_byte_range /
  remap_file_ids / extract_file_id delegate to invocation_anchor
- required-Invocation-anchor invariant on `shortcode` kind documented
  with `By::shortcode` doc-comment requirement; enforcement split
  across Plan 6 audit test and Plan 7 debug_assert
- Lua-table discriminant pinned to `t = "Generated"`
- §Test plan and Phase 6 expanded to cover every accessor + mutator
  + the `combine()` × Generated corner
- migration scope corrected (15 files, 27 occurrences); references
  and line ranges verified against the worktree source
- §Open questions section removed (no open questions remain)

Cross-plan `from` rename swept across Plans 3, 5, 6, 7, 8.

Plan 5 JSON wire format (option D):
- outer JSON key `anchors` → `from` (matches Rust field name)
- inner anchor pool reference `from` → `si_id` (distinctive; avoids
  the `parent_id` tree-structure mental model that fits Substring's
  chain but not anchor references)
- Reader/writer code samples updated; TS-side `SourceInfoEntry`
  shape note updated

Plan 6 + Plan 7 hand-offs for the required-anchor invariant added.
Deferred follow-ups (Dispatch anchor, ValueSource anchor) cross-
referenced as bd-36fr9 and bd-129m3 (committed separately to main).
Plan 4 work happens on top of an integration branch carrying exactly
one failing test (lua_shortcode_lipsum_fixed orchestrator mode,
filed as bd-3odjm). That test's root cause is the wire-format
code-3 collision Plan 5 owns, so Plan 4 must not try to fix it
locally.

Plan 4:
- New §"Inherited pre-existing failure (bd-3odjm)" section between
  Out of scope and Work items. Explains the test, the panic shape,
  the root cause, and that any *other* failure in the idempotence
  suite is a Plan-4 regression.
- Phase 7 verification gate updated: cargo nextest expects exactly
  one failure (bd-3odjm); cargo xtask verify trips on the same one.

Plan 5:
- New §"Inherited failure that must close on Plan 5's first reader
  change (bd-3odjm)" section. Spells out the contract: Plan 5's
  first reader change must turn lua_shortcode_lipsum_fixed green.
  If it doesn't, the Plan-5 author has an immediate signal that
  either the reader discrimination is wrong or the lipsum path
  produces a code-3 shape neither arm handles — stop and focus on
  it before moving on.
- Test plan now cites bd-3odjm as the live first-iteration smoke
  check, ahead of the hand-constructed tests.

Both plans now read consistently with the state of feature/provenance.
Plan 4 committed `from: SmallVec<[Anchor; 1]>` as the field type, but
Plan 5's reader/writer + Plan 6's stamper code samples still used the
`vec![]` macro to construct it. Those samples would not compile if
taken literally — `vec!` produces a `Vec`, not a `SmallVec`. Switch
to `smallvec![]` everywhere `Generated.from` is constructed:

- Plan 5: 4 occurrences (legacy-Transformed code-3 reader; Anchor
  dedup test description; forward-compat test description; round-
  trip test description).
- Plan 6: 14 occurrences across §"Per-transform fixes",
  §"Lua-shortcode enrichment", §"The post-walk helper",
  §"Variant semantics summary" etc.

No semantic change — same constructions, just the macro that
actually returns the field type.
Plan 4 + Plan 5: change Generated.from's inline capacity from
SmallVec<[Anchor; 1]> to SmallVec<[Anchor; 2]> so the steady-state
post-follow-up shape (Invocation + ValueSource on meta/var; Invocation
+ Dispatch on Lua-handler shortcodes) stays heap-free. Cost is +16
bytes per empty Generated; saves a heap allocation on every
multi-anchor shortcode resolution.

Also folds in research findings that were tacit in the previous draft:

- Phase 1 smallvec line: replace "or verify present" hedge with the
  concrete two-file Cargo.toml edit (workspace + quarto-source-map),
  noting verified-absent.
- skip_serializing_if path: use the fully-qualified
  serde_json::Value::is_null (the short form is a frequent gotcha).
- By::raw policy: accept-all; forgery caught by Plan 6 audit + Plan 7
  debug_assert, not by constructor rejection.
- Anchor ordering: append order, stable across serde, at most one
  anchor per known role.
- extract_file_id: empty-from Generated returns None, matching
  FilterProvenance's behavior; both call sites in to_ariadne_report
  already tolerate None. Stays a private fn on DiagnosticMessage.
- Lua serde Concat recursion: legacy "FilterProvenance" inside a
  Concat piece is handled automatically; no .snap/.json fixtures
  contain the legacy tag.
- Default risk: no struct holding SourceInfo derives Default in
  quarto-pandoc-types; Default for SourceInfo itself stays unchanged.
- combine() × Generated: verified unreachable today (all 17 call
  sites combine Original/Substring shapes); the Phase 6 test
  documents intent for any future caller.
- PartialEq: no production call site compares SourceInfo today; the
  derive is required by Block/Inline but not load-bearing.
The previous "+16 bytes per Generated" note understated the cost by
~2.5x. Actual delta:

- Anchor = AnchorRole (32 bytes — String-bearing Other variant
  dominates) + Arc<SourceInfo> (8) = ~40 bytes.
- SmallVec<[Anchor; 1]> ≈ 48 bytes; SmallVec<[Anchor; 2]> ≈ 88 bytes
  on the stack — a 40-byte delta per SmallVec field.
- Since SourceInfo is an enum, its stack size is dictated by the
  largest variant, so every SourceInfo (Original/Substring/Concat
  too) grows by 40 bytes — not just Generated instances. Block/Inline
  carry SourceInfo by value, so the cost multiplies across the AST
  (tens-to-hundreds of KB on a large doc).

Plan keeps cap=2 — the trade is still defensible — but documents the
real cost honestly and notes Arc-boxing Generated as the next lever
if memory-per-node ever bites the q2-preview editor.
gordonwoodhull and others added 29 commits June 1, 2026 16:07
…::unknown

Drop the Pandoc-flavored naming. q2 isn't pandoc-centric and the
affected call sites aren't all Pandoc (CLI stdin, Lua handoff,
external filter binaries).

Renames:

- json::read_strict + json::read_lenient -> json::read (strict) +
  json::read_completing_source_info (the new lenient variant).
  The function name matches the surrounding read_<thing> convention
  in readers/json.rs (read_inline, read_block, read_attr_source,
  make_source_info). Says exactly what it does.

- By::external_pandoc -> By::unknown. Honest about what we know
  ("we don't know"), generic enough to cover all four outside-world
  call sites (qmd-syntax-helper, CLI stdin, external filter, Lua
  handoff).

Pool-slot constants chained via + 1 in writers/json.rs so future
reserved slots don't require hardcoded number changes:

  pub const USER_EDIT_SOURCE_INFO_ID: usize = 0;
  pub const UNKNOWN_SOURCE_INFO_ID: usize = USER_EDIT_SOURCE_INFO_ID + 1;

SourceInfoSerializer::new() pre-pushes the slots in declaration order;
a unit test next to the constants asserts the pool entries match,
so adding or rearranging slots fails the test rather than silently
shifting IDs at consumer sites. The TS hand-mirror follows the same
pattern with a Rust-side CI parity test.

Provenance-contract.md §2 catalog: drop external_pandoc row, add
unknown row noting it's the source_info-completing reader's
placeholder.

Co-authored-by: Claude <noreply@anthropic.com>
Three coordinated changes to the design doc:

1. Define "authored content" upfront, before the BP formal statement.
   Replaces "node-local content" everywhere. The new term carries both
   the structural aspect (excludes descendants' bytes) and the semantic
   aspect (producer-contract attests user authorship). Pipeline-
   generated nodes have no authored content by definition; the dispatch
   routes them to non-emitting rules.

   (P2) now reads cleanly: "the byte was produced by serializing the
   authored content of a single AST node n." Reader doesn't have to
   infer the user-authorship scope from the dispatch table.

2. Add the Completeness section as a dual to Soundness. Four clauses
   partition every byte:

   (C1) Preserved - Source bytes still claimed by AST_new appear.
   (C2) Authored  - non-soft-drop nodes' authored content appears.
   (R)  Refused   - soft-drop sites refuse authored content + warn.
   (D)  Deleted   - bytes no longer claimed don't appear.

   C-prefix denotes positive completeness (appears in Source'); R/D
   denote negative cases (doesn't appear). (C1)/(C2) dual (P1)/(P2).
   "Soft-drop site" defined precisely as "UseAfter or RecurseIntoContainer
   AND editability gate returns not-editable." R5-special (let-user-win)
   is explicitly NOT a soft-drop site; it falls under (C2).

   Proof by structural induction over R1, R1', R2, R2', R5, R3/R4 cases.

3. Rename "What BP does not promise" to "What BP and Completeness do
   not promise". Reclassify the marker-fidelity / lazy-numbering /
   block-container shell-regeneration gaps as a single unified
   completeness gap: helper-emitted bytes don't preserve user-specific
   syntactic choices. Soundness still holds (helper output is honest
   authored content via P2); completeness fails for byte-level fidelity
   of the original syntactic form. Producer-hygiene caveat updated to
   note both invariants depend on it.

Plan 7d Phase 4 gains a companion property test: completeness_holds
(parse(Source') structurally equivalent to AST_new for non-soft-drop
inputs), alongside the existing bp_holds (no atomic-Generated bytes
leak). The two properties pin both invariants empirically.

Co-authored-by: Claude <noreply@anthropic.com>
Property tests verify every input satisfies the property, but say nothing
about which dispatch rules the generator actually exercises. Without
coverage assertions, a generator subtly biased toward easy cases (mostly
R1, rarely R5-special) gives a false sense of confidence.

Add thread-local DispatchCounters in plan_user_writes, gated behind a
dispatch-coverage build feature (zero cost in production). Each
dispatch row ticks per visit. Property tests assert per-row minimum
coverage after proptest completes; under-exercised rows fail with a
specific message naming the row.

Tuned thresholds:
  R1               >= 100   (most common; preserved content)
  R1' (soft-drop)  >= 50    (atomic-Generated edit refusal)
  R2 / R2'         >= 20    (omit / soft-omit)
  R3-helper        >= 50    (new container with helper shells)
  R3-transparent   >= 50    (sectionize wrapper recursion)
  R4               >= 30    (inline container preserved shells)
  R5               >= 50    (leaf serialization)
  R5-special       >= 20    (let-user-win atomic CustomNode replace)

Keeps the generator honest as the writer evolves: future contributors
adding a dispatch row must add a corresponding threshold; a future
change that accidentally makes a row unreachable surfaces as a coverage
failure rather than passing tests.

Co-authored-by: Claude <noreply@anthropic.com>
Add a framing sub-section at the top of Phase 4 that ties the four
testing pieces together as a coordinated strategy:

- Generator (gen_pandoc_with_atomic_descendants) — produces ASTs with
  atomic-Generated descendants at varying depths plus user-edits.
  Extends the existing quarto-ast-reconcile generators with two new
  capabilities: atomic-injection at configurable density, and realistic
  user-edit transformations.

- Marker-string convention for soundness (bp_holds) — fresh
  recognizable marker per iteration injected into atomic-Generated
  content; one-line assertion that it doesn't appear in Source'.

- Structural-equivalence reuse for completeness (completeness_holds) —
  reuses quarto_ast_reconcile::hash::compute_block_hash (already
  source-info-blind per hash.rs:498), which absorbs helper-
  canonicalization gaps at the AST level without bespoke matchers.

- Required dispatch-coverage instrumentation — the full spec stays in
  the work items below; the intro names it as the fourth coordinated
  piece.

Closes the loose thread from the conversation: I had offered to write
this sub-section but only landed the coverage-counter piece. The four
pieces fit together; the intro makes the fit explicit so a future
implementer reading Phase 4 understands the strategy before the work
items.

Co-authored-by: Claude <noreply@anthropic.com>
Item 1 (Phase 4 — pool intern dedup): The serializer's intern cache is
strict Arc-pointer equality at parent edges only; it never dedups
top-level intern calls by value. Round-tripped completing-reader nodes
will get fresh pool entries structurally equal to the reserved slots.
Decision: accept the duplication (option a). Bounded, per-document,
cosmetic. Add a one-line comment near intern marking it intentional.

Item 2 (Phase 4 — per-caller reader-split verification): All five
outside-world callers consume source_info downstream, so the placeholder
choice matters. json_filter.rs gets By::filter(filter_path, 0); the
other four get By::unknown(). Signature change: read_completing_source_info
should accept default_by: By rather than baking unknown in, so callers
declare their provenance up front. Flag: qmd-syntax-helper's qmd::write
calls shift dispatch from R1-empty to R5-synthesize — the new behavior
is correct.

Item 3 (Phase 6.5 — reconciler "synthesis sites"): The line numbers in
the earlier draft pointed to test code (AttrSourceInfo::empty field
assignments in #[cfg(test)] blocks), not InlineAttr::new calls. The
three real production InlineAttr::new sites live in pampa's tree-sitter
lowering and pass non-empty attr_source; they need explicit source_info
wired through from the surrounding parse range. By::reconcile_synthesize
becomes a forward-looking primitive; no producer uses it at 7f-landing.

Item 4 (Phase 1 — renderCustomNodeChildren): Verified preserves s: via
{ ...customNode, slots: ... } spread at dispatch.tsx:274. Both CustomBlock
and CustomInline reach the same path. Move both from "needs verification"
to "preserves."

Open questions for review:
- By::filter atomic-kind concern for external filter output (item 2 table).
- Whether read_completing_source_info reuses UNKNOWN_SOURCE_INFO_ID when
  default_by == By::unknown() or always allocates fresh (recommend fresh
  for uniform path).
…_synthesize, expand 6.5 scope

Decisions locked in (2026-05-30 conversation):

- Keep USER_EDIT_SOURCE_INFO_ID = 0 magic number (framework can't allocate
  into the Rust pool; the slot ID must be agreed in advance).
- Drop UNKNOWN_SOURCE_INFO_ID and the second reserved slot. The completing
  reader takes `default_by: By` and allocates a fresh pool entry on every
  fill. No hand-mirror, no parity test for slot 1, no special case for
  `default_by == By::unknown()`.
- Drop By::reconcile_synthesize entirely — no producer uses it at 7f-landing.
- Add By::is_programmatic_sentinel() predicate covering config-default,
  programmatic-config, unknown. Replaces the navigation_href.rs equality
  check against SourceInfo::default(). No is_default() function needed.
- By::unknown is non-atomic. By::filter is atomic and the right semantic
  for json_filter.rs (filter-added nodes shouldn't be source-editable).

Phase 3 walker fix:

  The previous walker used a 't' in value heuristic to recurse into
  CustomNode slots, which would have misread the Slot wrapper
  ({ kind, value }) as a non-AST object and silently failed to stamp
  anything inside slots. Rewritten to dispatch on slot.kind per the
  actual TS Slot discriminated union at
  ts-packages/preview-renderer/src/framework/types.ts:123-128.

Phase 6.5 expansion:

  Audit found additional production SourceInfo::default sites the plan
  missed:
  - config_value.rs:822, 826 (insert_path intermediates) → By::programmatic_config
  - project_resources.rs:541 (canonicalize_within_project sentinel) → By::unknown
  - navigation_href.rs:382 (equality check) → is_programmatic_sentinel pattern

  SchemaError::InvalidStructure scope corrected: 4 None sentinel sites
  (merge.rs:32/51/88, mod.rs:250), ~11 Some(value.source_info.clone())
  sites in helpers.rs, plus a formatter at error.rs:33-46. Plan
  previously claimed "four call sites" — undercounted by 3×.

Mechanical fixes:

  - InlineAttr::new line numbers 304-311 → 333-348 (the actual location).
  - JsonReadError line numbers 23/30 → 25/31.
  - writers/json.rs s:-bearing struct range 1010-1116 → 1068-1195.
  - Phase 7 deprecated Default impl: file: FileId(0) → file_id: FileId(0).
  - Phase 5: clarify the "remove the camelCase fallback" wording (no
    real fallback exists; the per-field rename overrides the macro).
  - ATOMIC_CUSTOM_NODES Rust + TS paths spelled out for the parity test.
  - attr.rs:45-46 stale doc-comment (claims SourceInfo::default fallback;
    real consumers fall back to None) noted for cleanup.
…lan 7d trust-point gate

Decisions locked in (2026-06-01):

- PandocNativeIntermediate::IntermediateAttr widens to carry SourceInfo
  alongside (Attr, AttrSourceInfo). Cleaner provenance than chasing
  source_info through three uneven call paths; one producer-side update
  versus three consumer-side refactors.
- q2-debug uses the framework's <Node>, so Phase 3 stampUserEdits comes
  for free. Only one q2-debug-local renderer (Figure at components.tsx:110)
  needs the Phase 2 spread-fix.
- Plan 7d's R5 trust point is enforced by `-D deprecated`. After
  Phase 7 lands the deprecation, denying it in CI turns every remaining
  SourceInfo::default() caller into a compile error. The compiler is
  the audit; no separate residue grep step needed.

Audit results (four background agents, 2026-06-01):

1. Cross-crate residue: green. The 447 quarto-core SourceInfo::default
   hits dramatically overstate exposure. Actual production residue
   beyond Phase 6.5's list: citeproc/output.rs:1274,
   quarto-config/materialize.rs:132/152/165,
   quarto-core/project/listing/feed/stage.rs:596/602. All added to
   Phase 6.5 work items.

2. derive(Default) on SourceInfo-bearing structs: false alarm. None of
   the five candidate structs actually contain a SourceInfo field. The
   deprecation won't fire on them. Phase 8 audit step downgraded to a
   no-op note.

3. ConfigValue::default semantics shift: safe. Only 2 production
   callers (include_expansion.rs:203,238); both construct a transient
   Pandoc wrapper and discard the .meta field without reading it.
   Migration sound.

4. Snapshot churn: 62 .snap files in crates/pampa/snapshots/json/
   (one directory). Other 167 snapshots unaffected. Phase 6's
   dispatch shift expected to produce zero snapshot diffs (the
   harness uses real-parsed AST, not defaults). Commit-split
   recommended: Phase 5 renames first, then Phase 4 pool-shift,
   then Phase 6 (expect no snap diffs).

Plan now reads end-to-end with bounded scope and a compile-time
enforcement mechanism. Ready for implementation.
… for rebase

Prepares feature/provenance for rebase onto origin/main, which landed the
integration-test consolidation (#239 / bd-xvdop): every crate now has a
single `tests/integration/<name>.rs` + `tests/integration/main.rs` binary
instead of one binary per `tests/<name>.rs`.

`idempotence.rs` is the only test file on this branch that is NEW (no
counterpart on main), so a straight rebase would land it in the deprecated
old layout with zero conflict and zero signal — silently reintroducing the
per-file-binary bloat #239 removed, caught by no lint or compile error.

Move it into the new layout now, as an explicit, reviewable, build-verified
commit, so the placement is a verified fact before the 85-commit replay
rather than a post-rebase chore:

  - git mv tests/idempotence.rs -> tests/integration/idempotence.rs
  - add tests/integration/main.rs registering `pub mod idempotence;`

On rebase, the new tests/integration/main.rs will collide (add/add) with
main's version (~34 modules); resolution is a trivial union (keep main's
list + idempotence). That loud conflict is the point — it can't be missed.

Verified on this branch (pre-rebase): integration binary compiles; all 27
idempotence tests pass under `binary(integration)`.

The genuinely-renamed test files (incremental_writer_tests.rs et al.) are
left for rebase rename-detection to follow + a post-rebase structural check.
Earlier note implied 7b might use hand-crafted JSON that would need
the strict-reader pattern. After reading 7b in full: it's qmd-focused
test coverage that constructs ASTs directly in Rust and exercises
the qmd writer (`incremental_write`, `compute_blocks_hash_fresh`).
No JSON reads or wire-format assertions.

7b ships after 7f. The interaction is API-surface-only — 7b's
authors write against the post-7f APIs from the start (for_test,
3-arg InlineAttr::new, widened IntermediateAttr). No rebase work
needed.
…_INFO_ID

Plan 7f's 2026-05-30 research findings dropped two earlier-draft items:

- `By::reconcile_synthesize()` — no producer uses it at 7f-landing
  time; remove from the By:: catalog. Add back later if a reconciler
  path appears that synthesizes new AST without an input SourceInfo
  to inherit from.
- `UNKNOWN_SOURCE_INFO_ID` reserved pool slot — the completing reader
  takes a `default_by: By` parameter and allocates a fresh pool entry
  per missing `s:`, so there's no slot 1. Rewrite the `By::unknown()`
  row to describe the actual mechanism.

Brings provenance-contract.md back in sync with the plan; pre-Phase-1
cleanup so the catalog matches what ships.
Wrap-rebuild renderers in `dispatch.tsx` and the q2-debug `Figure`
renderer were emitting a fresh `{ t: '<Tag>', c: newChildren }` object
on every child edit, dropping `s:` (and every other top-level field)
from the rebuilt parent. After Phase 2:

- 19 stripping renderers (Emph/Strong, the five flat inline wrappers
  via `makeFlatInlineRenderer`, Link/Image/Span/Quoted,
  Para/Plain/Header/BlockQuote/Div, BulletList/OrderedList/Figure) now
  rebuild via `{ ...node, c: ... }`.
- q2-debug's local Figure renderer at
  `hub-client/src/components/render/q2-debug/components.tsx:110`
  gets the same spread treatment.
- `dispatch.test.tsx` covers all 22 entries in the
  `renderChildrenRegistry`: 19 that previously failed and the 3 that
  already preserved (`Ast`, `CustomBlock`, `CustomInline`).

Preserving `s:` is a precondition for the strict JSON reader landing
in Plan 7f Phase 4. Without it, every child edit rebuilds an ancestor
with no source_info reference, which the strict reader would reject.
…7f Phase 3)

Wrap `<Node>`'s `setLocalAst` so every AST a user-edit affordance hands
up the chain has `s:` populated on every node. The walker:

- Stamps `s: USER_EDIT_SOURCE_INFO_ID` (slot 0) on any node lacking `s:`.
- Leaves preserved nodes (those with existing `s:`) untouched, so the
  Phase 2 rebuilt-wrapper path keeps the original parent's source_info.
- Recurses into `c:` (standard wrapper shape) and `slots:` (CustomNode
  shape, dispatched on `slot.kind`).
- Walks nested arrays inside `c:` so Header / Link / BulletList shapes
  stamp their inner inline arrays correctly. Tagged-marker values
  (`{t: 'DisplayMath'}`, `{t: 'SingleQuote'}`) get a spurious `s:`
  field; serde-tag-based reads ignore it (markers are deserialized
  by tag, not by struct), so this is harmless.

The atomic-gate noop path skips stamping — wasted work when the edit
is dropped anyway. Stamping is per-node idempotent; outer-level
rewalking of a stamped subtree is a no-op.

`USER_EDIT_SOURCE_INFO_ID = 0` lands in
`ts-packages/preview-renderer/src/types/sourceInfo.ts` here, ahead of
Plan 7f Phase 4's Rust counterpart + hand-mirror parity test.

Three plan-mandated tests + four robustness tests in
`stampUserEdits.test.ts`: fresh Span stamping, rebuilt-wrapper
preservation, splice-in (new + preserved siblings), CustomBlock slot
recursion, `block`/`inline` single-value slot recursion, nested-array
walks (Header c[2], BulletList items), idempotence.
…an 7f Phase 4)

`By::unknown()` is the placeholder kind for nodes deserialized through
`json::read_completing_source_info` when the upstream producer doesn't
populate `s:` — qmd-syntax-helper's Pandoc subprocess output, CLI
`--from json`, Lua AST handoff. Non-atomic by design: nodes carrying
this kind remain editable in the preview; user edits re-stamp them as
`user_edit` on save.

Extends `test_by_is_atomic_kind` to assert non-atomicity, and adds a
`test_by_unknown_constructor` that pins `kind == "unknown"` + null
`data`. Phase 6.5's `is_programmatic_sentinel()` predicate will
recognize this kind alongside `config-default` and `programmatic-config`.
…se 4)

Splits the JSON reader's leniency into two named entry points:

- `json::read` becomes strict — nodes missing their `s:` reference fail
  with `JsonReadError::MissingSourceInfoRef { node_path }`. The node_path
  is best-effort (tag name + parent context); good enough for a debugger
  to find the responsible producer site without the plumbing cost of a
  precise JSON-pointer.
- `json::read_completing_source_info(reader, default_by: By)` fills
  missing `s:` with `Generated { by: default_by, from: [] }` in-place per
  node (no pool entries allocated on read — the writer creates the pool
  ID on re-serialize). Used by every site that consumes JSON from
  outside q2's source-tracking world.

Five outside-world callers switched per the plan's per-caller table:

- `json_filter.rs` → `By::filter(filter_path, 0)` (atomic-kind for
  filter-added nodes; pass-through nodes keep their original `s:`).
- `qmd-syntax-helper/{definition_lists,grid_tables}.rs` →
  `By::unknown()`. Writer dispatch for these nodes shifts from
  R1-empty to R5-synthesize, which is the correct round-trip behavior.
- `pampa/src/main.rs` (CLI `--from json`) → `By::unknown()`.
- `pampa/src/lua/readwrite.rs` (Lua `pandoc.read(_, "json")`) →
  `By::unknown()`.

The strict reader catches two real writer bugs that previously
round-tripped silently through `SourceInfo::default()`:

1. `write_custom_block` and `stream_write_custom_block` synthesized
   `Plain`/`Div` wrappers for slot encoding without `s:`. Same shape
   in `write_custom_inline` / `stream_write_custom_inline` for the
   `Span` wrapper and the `[block content]` placeholder Str. All now
   inherit the parent CustomNode's `s_id`.
2. `Figure` did not emit `captionS` (Table did). Strict reader
   rejected Figure captions; added `captionS` to both the buffered
   and streaming Figure writers, and updated the Figure reader to
   consume it. Same shape as Table's `captionS`.

Tests:
- `json_reader_smoke_tests.rs` reads Pandoc-format fixtures under
  `tests/readers/json/` — switched to `read_completing_source_info`.
- `test_json_div_transforms.rs` mimics `--from json` with hand-crafted
  pampa JSON — switched to match `main.rs`.
- Full pampa suite (3903 tests) + workspace suite (9727 tests) green.

Required adding `quarto-source-map` as a direct dep of `qmd-syntax-helper`
(previously transitive through `pampa`).
…ool`→`p` (Plan 7f Phase 5)

Phase 5 of Plan 7f compacts two top-level JSON keys to match the
rest of the wire format's single-character convention.

Writer (`crates/pampa/src/writers/json.rs`):
* `#[serde(rename = "a")]` on `NodeWithAttrJson::attr_s`; field-order
  invariant preserved (a, c, s, t still alphabetic).
* `#[serde(rename = "p")]` on `AstContextJson::source_info_pool`;
  alphabetic order under `astContext` preserved (files,
  metaTopLevelKeySources, p).
* All 24 literal `"attrS"` keys and the 1 `"sourceInfoPool"` key
  in object-construction sites updated; doc comments + the Figure
  inline order-comment rewritten for the new alphabet (a, c,
  captionS, s, t).

Reader (`crates/pampa/src/readers/json.rs`): symmetric reads of the
new keys (14 sites + 1 pool key); error variant messages and the
deserializer doc-block now reference `p` and `a` while keeping
the human-readable name "source-info pool".

TS:
* `ts-packages/pandoc-types/src/types.ts` — 11 `attrS:` interface
  fields → `a:`; `RustQmdJson.astContext.sourceInfoPool` → `p`.
* `ts-packages/preview-renderer/src/types/sourceInfo.ts` &
  `framework/Ast.tsx` — `AstContext.p` is the wire-format key; the
  internal React-context field stays `sourceInfoPool` for readability.
* `ts-packages/annotated-qmd/src/{index.ts,block-converter.ts,
  inline-converter.ts}` — wire-format accesses
  (`block.attrS`/`inline.attrS`/`headS.attrS`/etc. → `.a`;
  `json.astContext.sourceInfoPool` → `.p`); internal parameter
  `attrS` renamed to `attrSource` for clarity. Tests, README,
  `debug-figure.js`, and `check_mismatches.py` follow.

Audit (2026-06-01) confirmed `hub-client/`, `q2-preview-spa/`, and
`crates/hub/` don't pattern-match on these keys — they delegate to
the TS type packages.

Snapshot regeneration:
* 62 `.snap` files in `crates/pampa/snapshots/json/` regenerated
  via `INSTA_UPDATE=always cargo nextest run -p pampa`. Diff is
  pure key rename (`"attrS":`→`"a":`, `"sourceInfoPool":`→`"p":`)
  plus a refreshed snapshot-source header reflecting the post-
  bd-xvdop integration-tests layout (`tests/test.rs` →
  `tests/integration/test.rs`). Commit sequencing: Phase 5
  renames land first; Phase 4's pool-slot-0 commit follows and
  regenerates the same 62 files for the +1 ID shift.

Example-fixture regeneration:
* 20 `ts-packages/annotated-qmd/examples/*.json` + the
  `test/fixtures/math-with-attr.json` rebuilt by running
  `cargo run --bin pampa -- -t json -i <each>.qmd`. Committed
  fixtures dated to 2025-10-24 (commit 2b2337b) and were stale
  against multiple unrelated pampa releases; regeneration is
  required for the TS code (which now reads `a`/`p`) to find any
  data at all.

Docs:
* `claude-notes/designs/provenance-contract.md` — wire-format key
  references updated to `astContext.p`.
* `claude-notes/instructions/performance-profiling.md` — Python
  canonicalize snippet uses `astContext["p"]`.
* Historical plans/research notes intentionally retain `attrS` /
  `sourceInfoPool` since they describe state-as-of-then.

Verification:
* `cargo nextest run --workspace` → 9727 pass.
* `cargo xtask verify` (full hub-build leg) → all 12 steps green
  including WASM rebuild + q2-preview-spa bundle.
* `hub-client` unit tests → 82/82 pass.
* `preview-renderer` tests → 205/205 pass.

Known side-issue (not blocking): `annotated-qmd` shows 2/156 test
failures — pre-existing pampa source-tracking off-by-one (inline
code + div key-source spans capture a preceding whitespace byte).
Filed as `bd-1d6io` with suspected-cause investigation pointing
at commit `38e889ad` (2026-05-24, multi-line inline-code-span
tokenization rework). Phase 5 only renamed JSON keys; no offset
computation was touched.

Plan: claude-notes/plans/2026-05-29-q2-preview-plan-7f-prereqs.md
…se 4 pool-slot)

The React framework's `stampUserEdits` walker (Plan 7f Phase 3) stamps
`s: USER_EDIT_SOURCE_INFO_ID` on every AST node a `setLocalAst` call
introduces without an existing `s:`. Until now the constant existed only
on the TS side (added in commit `7ac9f445`); the Rust writer never
pre-populated slot 0, so the stamp resolved to whatever happened to be
interned first in each document. Most stamps landed on benign
`Original{0..0}` entries, but the semantic was wrong — `s:0` should
*mean* "this came from a user edit", not "this was the first thing
interned." This commit makes the round-trip honest.

Writer (`crates/pampa/src/writers/json.rs`):
* `pub const USER_EDIT_SOURCE_INFO_ID: usize = 0;` defined alongside
  `SourceInfoSerializer`. Docstring chains future reserved slots via
  `+ 1` and points at the TS hand-mirror.
* `SourceInfoSerializer::new()` now pre-pushes a
  `Generated{by: By::user_edit(), from: vec![]}` entry at index 0.
  The slot exists in every JSON document the writer produces
  regardless of whether any node references it.
* The 9 writer-side unit tests that asserted `pool.len() == N` after N
  interns now express N as `USER_EDIT_SOURCE_INFO_ID + 1 + N` (using
  `let first_user_id = USER_EDIT_SOURCE_INFO_ID + 1;` locally) so a
  future second reserved slot doesn't silently break call sites.
* New `test_reserved_slot_user_edit` pins the layout: a fresh
  serializer has `pool[USER_EDIT_SOURCE_INFO_ID]` carrying
  `Generated{by: user_edit, from: [], r: [0,0]}`. Rearranging reserved
  slots fails this test rather than silently shifting IDs.
* New `test_user_edit_slot_id_matches_typescript_mirror` reads
  `ts-packages/preview-renderer/src/types/sourceInfo.ts` via
  `CARGO_MANIFEST_DIR`-relative path, parses the
  `export const USER_EDIT_SOURCE_INFO_ID = N;` literal, and asserts
  `N == 0`. Catches rename, restructure, or value drift on either side.

Reader-side and TS-side tests that construct their own pool literals
were left as-is — they're not asserting against the writer's
reserved-slot contract.

Snapshot regeneration:
* 62 `.snap` files in `crates/pampa/snapshots/json/` regenerated.
  Diff is exactly the plan's predicted shape: every `"s":N` reference
  shifts to `"s":N+1`, every `Concat` piece `source_info_id` shifts
  by +1, and each pool gains a new entry at index 0:
    `{"d":{"by":{"kind":"user-edit"}},"r":[0,0],"t":4}`.

Example-fixture regeneration:
* 20 `ts-packages/annotated-qmd/examples/*.json` +
  `test/fixtures/math-with-attr.json` rebuilt by running
  `cargo run --bin pampa -- -t json -i <each>.qmd`. Same +1 shift on
  every `s:` reference plus the new pool[0] entry. Required because
  the TS test suite reads these fixtures and indexes into the pool
  by the `s:` field.

Verification:
* `cargo nextest run --workspace` → 9731 pass (+4 vs Phase 5: the new
  reserved-slot and TS-parity tests, each running once as a unit test
  and once via the integration binary).
* `cargo xtask verify` (full hub-build leg) → all 12 steps green
  including WASM rebuild + q2-preview-spa bundle.
* annotated-qmd: 2/156 known failures remain (bd-1d6io,
  source-tracking off-by-one) — unchanged from Phase 5; not caused
  by this pool shift.

Plan: claude-notes/plans/2026-05-29-q2-preview-plan-7f-prereqs.md
…tructors

Foundation for Plan 7f Phase 6 (test audit) and Phase 6.5 (production
residue sweep). Adds, in `crates/quarto-source-map/src/source_info.rs`:

- `By::test_scaffold()` — non-atomic, `kind: "test-scaffold"`. Paired
  with `SourceInfo::for_test()` for tests that need a `SourceInfo`
  field but have no real provenance to record.
- `SourceInfo::for_test()` — convenience that returns
  `Generated{by: test_scaffold(), from: []}`. Replaces
  `SourceInfo::default()` in test code; intentionally produces
  *different* writer dispatch (R5/R3 synthesize vs R1-empty-range
  copy) because the new behavior is the correct one for AST without
  real source bytes.
- `By::config_default()` / `By::programmatic_config()` — non-atomic
  sentinel kinds for `ConfigValue` residue sites (Phase 6.5
  `config_value.rs` fixes lean on these).
- `By::is_programmatic_sentinel()` — predicate matching
  `config-default | programmatic-config | unknown`. Replaces the
  pre-7f `source == &SourceInfo::default()` comparison in
  `navigation_href.rs`.

Six new unit tests cover: constructor shape (kind/data) for each new
`By::*`, non-atomicity for all four new kinds, `is_programmatic_sentinel`
positive/negative cases, and `SourceInfo::for_test` shape. The existing
`test_by_is_atomic_kind` was extended with three new negative
assertions so a future change can't silently promote `test-scaffold`,
`config-default`, or `programmatic-config` to atomic without breaking
the test.

No production callers yet — those land in subsequent commits per the
Phase 6 / 6.5 work-item split in CURRENT.md.
…olding

Plan 7f Phase 6 — first batch of the test audit. All sites in this
commit are structural test scaffolding (constructors that require a
SourceInfo field; no real source bytes exist for the hand-crafted
fixture).

Sites touched (test code only):

- crates/quarto-xml/src/types.rs — 11 sites in `mod tests`
  (XmlAttribute / XmlElement constructor scaffolding).
- crates/quarto-yaml-validation/src/tests.rs — 3 sites in
  `make_yaml_*` helpers.
- crates/quarto-yaml-validation/src/validator.rs — 14 sites inside
  the file's `#[cfg(test)] mod tests` (yaml_scalar / yaml_array /
  yaml_object / test_navigate_nested fixtures).
- crates/quarto-yaml-validation/src/schema/parsers/combinators.rs:66
  — local `source_info()` test helper.
- crates/quarto-yaml-validation/src/schema/helpers.rs:172 — same
  pattern, local `source_info()` test helper.
- crates/quarto-ast-reconcile/src/generators.rs:631 — proptest
  generator for Shortcode.
- crates/quarto-core/tests/integration/{jupyter_integration,
  navigation_e2e, navigation_merge, engine_merge, attribution_*}.rs
  — 35 sites across the 8 quarto-core integration tests that build
  hand-crafted Pandoc AST + ConfigValue fixtures.

Behavior implications (per CURRENT.md's writer dispatch note):

- `SourceInfo::default()` is `Original{FileId(0),0,0}` →
  `preimage_in(FileId(0))` returns `Some(0..0)` (empty range) → R1
  copies zero bytes. `for_test()` is
  `Generated{by:test_scaffold, from:[]}` → `preimage_in` returns
  `None` → R5/R3 synthesize (or pass-through wrapper). The new
  behavior is the correct one for AST with no real source bytes,
  and no test in this batch asserts on writer byte output.
- `navigation_href.rs:382` still uses `source == &SourceInfo::default()`
  (Phase 6.5 will swap this to `is_programmatic_sentinel()`). For
  the navigation_e2e / _merge / attribution tests in this commit,
  the swap is benign: `for_test()` no longer equals `default()`,
  but `resolve_byte_range()` returns `None` for the empty-from
  `Generated`, so navigation_href takes the "Concat/Filter"
  fall-through path and returns `raw` unchanged — same outcome as
  the old explicit short-circuit.

Schema/merge.rs:32,51,88 and schema/mod.rs:256 (the 4 production
SchemaError::InvalidStructure sites) intentionally not touched —
they belong to Phase 6.5's `location: Option<SourceInfo>` refactor.

Test results: per-crate `cargo nextest run` clean across all four
crates (24/24 quarto-xml, 265/265 quarto-yaml-validation, 218/218
quarto-ast-reconcile, 2199/2199 quarto-core).
Plan 7f Phase 6 — pampa batch. All swapped sites are test
scaffolding: pampa/tests/* (the 18 integration test files —
156 sites) and the `#[cfg(test)] mod tests` blocks inside
pampa/src/* (85 sites). Plus crates/pampa/src/lua/filter_tests.rs
(included via `#[cfg(test)] #[path = "filter_tests.rs"] mod` —
the whole file is test code, 156 more sites).

Test results: `cargo nextest run -p pampa` clean (3907/3907 pass,
2 skipped). No assertion-on-byte-output tests in this batch
regressed under the R1-empty-range → R5/R3-synthesize dispatch
shift that follows from for_test()'s non-Original shape.

Production-residue audit (deferred): per `git grep
'SourceInfo::default()' crates/pampa/src/`, 42 sites remain in
pampa src that are NOT inside `#[cfg(test)]`. Per-file breakdown:

- `readers/json.rs` — 7 sites, all marked "Legitimate default:
  backward compat" for legacy Pandoc JSON without source info.
  Explicitly allowed by `provenance-contract.md` §10. Will need
  `#[allow(deprecated)]` annotations under Phase 7's
  `#![deny(deprecated)]`.
- `lua/types.rs` (8), `lua/utils.rs` (10), `lua/readwrite.rs` (2)
  — Lua-side fallbacks where `filter_source_info` is expected to
  overwrite `SourceInfo::default()` with `Generated{by:filter,…}`
  before the AST is consumed. Producer contract acknowledges
  this pattern at the call-stack level.
- `citeproc_filter.rs` (3), `pandoc/meta.rs` (3),
  `writers/json.rs` (2), `toc.rs` (2),
  `template/config_merge.rs` (5) — genuine production residue
  the Phase 6.5 plan did NOT enumerate. Most need a new
  `By::citeproc()`/`By::yaml_error_recovery()`/`By::toc_synth()`
  kind or routing through `By::programmatic_config()` /
  `By::unknown()`. Surfacing as a per-site decision before Phase
  7 deprecation lands.

This commit ships the 312 test-only swaps (test_scaffold writer
dispatch is benign for tests that don't assert on byte output).
Production sites tracked separately for Plan 7f Phase 6.5
extension.
Plan 7f Phase 6 — final test-audit batch. Covers all remaining
crates with `SourceInfo::default()` test-scaffolding sites: 57
PURE_TEST files (where no production residue exists) bulk-swapped
end-to-end, plus 28 MIXED files where the swap was scoped to the
`#[cfg(test)] mod tests` region. Plus one `tests/integration/*.rs`
file (`quarto-sass/.../brand_config_test.rs`) that's all test code
by virtue of living under `tests/`.

Affected crates: quarto-core (all transforms, stages, engine
helpers, project plumbing), quarto-navigation (all subviews),
quarto-pandoc-types/config_value.rs (95 test sites + 1 unused
sentinel-equality test pinned to default() — see below),
quarto-pandoc-types/inline.rs, quarto-config (all submodules),
quarto-sass, quarto-doctemplate, quarto-yaml, quarto-publish,
plus the integration brand_config_test.

Two assertion-pin fixes after sed swept too eagerly:
- `quarto-core/src/stage/stages/engine_execution.rs:1378` —
  `test_execution_context_has_source_info` asserts against the
  production `ExecutionContext::new` default. RHS reverted to
  `SourceInfo::default()` with a comment; Phase 7's deprecation
  will surface engine/context.rs:92 as a residue site and the
  assertion gets updated alongside.
- `quarto-pandoc-types/src/inline.rs:1459` — `source_info_attr_empty`
  pins the `InlineAttr::new` fallback. RHS reverted to `default()`;
  this test is on Phase 6.5's deletion list (the InlineAttr::new
  signature refactor removes the fallback entirely).

Production residue remains (not part of this commit, surfaced for
Phase 6.5 + Phase 7):

- Planned Phase 6.5 sites (enumerated in CURRENT.md): config_value.rs
  (5), project_resources.rs (2), navigation_href.rs (1+2 follow-up),
  citeproc/output.rs (1), config/materialize.rs (3), listing/feed/
  stage.rs (2), yaml-validation/schema/merge.rs+mod.rs (4),
  pandoc-types/inline.rs (InlineAttr refactor + IntermediateAttr
  widening, ~10 sites).
- Discovered residue not in plan: ~70 additional production sites
  across pampa (citeproc_filter, toc, pandoc/meta, writers/json,
  template/config_merge, lua/types, lua/utils, lua/readwrite),
  quarto-analysis, quarto-core engine/jupyter, quarto-core
  transforms (callout_resolve, categories_sidebar, shortcode_resolve,
  sidebar_auto, theorem, …), quarto-navigation. These will be
  surfaced by Phase 7's `#![deny(deprecated)]` once the deprecation
  attribute lands; fixes can be applied per-site or temporarily
  allow-listed at that time.
- Legitimate `SourceInfo::default()` calls retained per the producer
  contract: 7 in `pampa/src/readers/json.rs` (Pandoc legacy-JSON
  backward compat, explicitly allowed by `provenance-contract.md`
  §10), 1 in `quarto-source-map/src/source_info.rs` (the actual
  `impl Default for SourceInfo` body — Phase 7 deprecates this).

Workspace tests: 9736/9736 pass, 196 skipped.
…config_default / programmatic_config

Plan 7f Phase 6.5 — first production-residue commit. Replaces four
of the five `SourceInfo::default()` sites in
`crates/quarto-pandoc-types/src/config_value.rs` with explicit
`Generated{by:…}` provenance:

- `impl Default for ConfigValue` (line 415) →
  `Generated{by: By::config_default()}`. The empty-Map sentinel
  used by every `ConfigValue::default()` caller.
- `ConfigValue::from_path` (line 539) →
  `Generated{by: By::programmatic_config()}`. WASM-bridge
  programmatic injection.
- `ConfigValue::insert_path` intermediate map + key_source (lines
  822, 826) → same `programmatic_config` provenance.
- Doc-comment example for `insert_path` updated to show the new
  shape.

(Fifth `default()` site was on the assertion side of the now-fixed
`source_info_attr_empty` test — that test still asserts against
the production fallback in `InlineAttr::new`, which Phase 6.5's
InlineAttr refactor removes.)

Reader-side compatibility: `crates/pampa/src/readers/json.rs:2212`
(top-level meta) updated to match. The JSON wire format does not
carry a per-meta `s:` field (Pandoc-compatible), so the reader
stamps the meta with the same `config_default` kind the writer's
`ConfigValue::default()` now produces. Without this, every JSON
round-trip would observably drop the meta's source_info; the
`test_json_roundtrip_simple_paragraph` test caught it. The five
other "Legitimate default" sites in the same function
(2191/2195/2199/2315/2339 — backward-compat for legacy
Pandoc-only JSON without `key_sources`) are deliberately left as
`SourceInfo::default()` for now; Phase 7's deprecation will surface
them as `#[allow(deprecated)]` candidates.

Workspace tests: 9736/9736 pass, 196 skipped.
Plan 7f Phase 6.5 — second production-residue commit. Replaces the
remaining enumerated sites in `quarto-core`:

- `crates/quarto-core/src/project_resources.rs:123` —
  `Pattern::without_source` was using `SourceInfo::default()` as a
  scaffolding sentinel. Now `Generated{by: By::unknown()}`.
- `crates/quarto-core/src/project_resources.rs:541` — Engine /
  Lua-filter resource entries don't carry a YAML source location;
  the call to `canonicalize_within_project` still requires a
  `SourceInfo` per the current signature. Replaced
  `&SourceInfo::default()` with `&SourceInfo::generated(By::unknown())`.
  Follow-up beads issue **bd-3az78** filed to refactor
  `canonicalize_within_project` to take `Option<&SourceInfo>`.
- `crates/quarto-core/src/transforms/navigation_href.rs:382` — the
  programmatic-sentinel detector. Pre-Phase-6.5 code compared
  `source == &SourceInfo::default()`; that equality survives only
  as long as `Original{FileId(0),0,0}` is the canonical sentinel.
  Replaced with the producer-side predicate:
  `let SourceInfo::Generated { by, .. } = source && by.is_programmatic_sentinel()`.
  Matches the `config-default | programmatic-config | unknown`
  set introduced earlier in Phase 6.5. Doc-comment for the
  function updated to describe the new shape.

Workspace tests: 9736/9736 pass, 196 skipped.
…tion → Option<SourceInfo>

Plan 7f Phase 6.5 — eliminates the last residual `SourceInfo::default()`
sites in quarto-yaml-validation. The variant's location field is now
`Option<SourceInfo>`, distinguishing two semantically distinct cases:

- **`Some(...)`** — error arose while validating user-supplied YAML
  against a schema. ~33 call sites in
  `schema/{helpers,parser,parsers/*}.rs` already pass a real
  `value.source_info.clone()` / `item.source_info.clone()` from the
  parsed YAML node; each wrapped in `Some(...)`.
- **`None`** — error describes a bug in the schema *definition*
  itself (no user-YAML to point at). 4 sites:
  `schema/merge.rs:32, 51, 88` and `schema/mod.rs:250` previously
  passed `quarto_yaml::SourceInfo::default()` as a placeholder.

Formatter (`error.rs:33-46`) now branches on `Option`: present →
`"… (at offset N)"`, absent → no span suffix.

Test pattern-matching at all destructure sites uses `{ message, .. }`
so no test code needed updating. Added a regression test
`test_schema_error_invalid_structure_display_no_location` for the
new None branch.

Compiler walked through 37 mismatched-types errors across 7 files
and the `Some(...)`-wrap is mechanical at every call site (the
right answer is what `rustc --explain E0308` literally suggests).

Workspace tests: 9737/9737 pass, 196 skipped (one new test).
…t SourceInfo

Plan 7f Phase 6.5 — eliminates the empty-AttrSourceInfo sentinel
that was the last `SourceInfo::default()` site in
`quarto-pandoc-types`. `InlineAttr::new` is now a three-argument
constructor that requires the caller to supply a real
`source_info`. A `new_from_attr_source` convenience preserves the
"derive from non-empty AttrSourceInfo" path for the two test sites
that legitimately want it.

Producer-side: widened the `PandocNativeIntermediate::IntermediateAttr`
enum variant from `(Attr, AttrSourceInfo)` to
`(Attr, AttrSourceInfo, SourceInfo)`, paying the source_info
acquisition once at the producer instead of three times at each
consumer. All three production consumers
(`treesitter.rs:558`, `treesitter_utils/caption.rs:35`,
`treesitter_utils/paragraph.rs:27`) now destructure the third
field and pass it straight through to `InlineAttr::new`.

Producer constructors that emit `IntermediateAttr`:

- `treesitter.rs:1166` (commonmark_specifier) — passes
  `node_source_info_with_context(node, context)`.
- `treesitter.rs:1183` (unnumbered_specifier) — same.
- `treesitter.rs:1202` (attribute_specifier empty fallback) — same.
- `treesitter_utils/commonmark_attribute.rs:58` — gained a `span`
  parameter; callers supply it from their local tree-sitter node.
- `treesitter_utils/info_string.rs:30` — re-uses the language-source
  range (no separate parent span available).
- `treesitter_utils/language_specifier.rs:116` — uses
  `node_source_info_with_context(node, context)` over the
  language_specifier node.
- `treesitter_utils/language_specifier.rs:161` — dead-code fallback
  in `process_nested_language_specifier`, updated for consistency.

Eight consumer destructure sites updated to ignore the new third
field with `, _` (atx_heading, code_span_helpers, editorial_marks,
fenced_code_block ×2, fenced_div_block, span_link_helpers ×2).
None of these production consumers currently uses the
intermediate's source_info — they take their span from the
parent tree-sitter node directly.

Test-code call sites (`InlineAttr::new(empty_attr(),
AttrSourceInfo::empty(), …)`) — six sites in `filters.rs`,
`writers/plaintext.rs`, `lua/types.rs`, `lua/filter.rs` — pass
`SourceInfo::for_test()` as the third argument. Two test sites
in `inline.rs` that exercise the `AttrSourceInfo` → `source_info`
derivation moved to the `new_from_attr_source` convenience method.

Deletes the obsolete `source_info_attr_empty` test (the case it
asserted — empty AttrSourceInfo + InlineAttr::new fallback to
`SourceInfo::default()` — is now structurally impossible).

Doc-comment for `AttrSourceInfo` at `attr.rs:44-46` updated: the
old "fall back to `SourceInfo::default()`" recipe no longer
matches reality (theorem.rs and proof.rs fall back to `None`
already).

Workspace tests: 9736/9736 pass, 196 skipped.
Phase 6 (test audit) and Phase 6.5 (production residue sweep —
enumerated sites) are complete; full `cargo xtask verify` passes
all 12 steps including the WASM build leg. Plan file checkboxes
updated and a new "Discovered production residue" section
catalogues the ~70 unplanned `SourceInfo::default()` sites the
Phase 6 sweep surfaced. Per the plan's "-D deprecated strategy",
these are deferred to Phase 7's compiler-driven audit.
Plan 7f Phase 6.5 extension — apply explicit `By::*` kinds to the
~25 pampa production sites the original plan didn't enumerate.

New `By::citeproc()` constructor (atomic, non-sentinel): citeproc-
rendered content (citation Str replacements, bibliography `Div`s,
`#refs` wrappers) generated by CSL processing. Atomic — the user
edits citation styles via CSL, not through the preview's inline
editing surface. Added to `is_atomic_kind()`'s match arm.

Per-site fixes:

- `pampa/src/template/config_merge.rs` (5 sites — lang default,
  pagetitle, top-level template-defaults map) → `By::config_default()`.
  Template defaults are the canonical "no value in user config, use
  this fallback" semantic.
- `pampa/src/toc.rs:98, 190` (TocEntry → ConfigValue,
  NavigationToc → ConfigValue) → `By::programmatic_config()`.
  Programmatic derivation from in-memory TOC structures.
- `pampa/src/citeproc_filter.rs` (3 sites — citation Str, bib
  entry Div, refs Div wrapper) → `By::citeproc()`.
- `pampa/src/pandoc/meta.rs:95, 97, 231` (yaml-markdown-syntax-error
  recovery Span + yaml-tagged-string Span) → reuse the caller's
  `source_info` for both wrapper and inner Str so attribution points
  at the offending YAML range. The wrapper has the same bytes as the
  inner scalar — no new `By::` kind needed.
- `pampa/src/writers/json.rs:604, 625` (yaml-tagged-string Span
  wrappers around Glob / Expr values) → same fix; reuse the value's
  `source_info` for both wrapper and inner Str.
- `pampa/src/lua/types.rs` (8 sites — Lua-side inline construction
  helpers), `pampa/src/lua/utils.rs` (10 sites — Lua block_to_inlines
  LineBreak separators), `pampa/src/lua/readwrite.rs` (2 sites —
  Lua → ConfigValue conversion) → `By::unknown()`. These are
  Lua-side synthesis; the producer contract acknowledges that
  `filter_source_info` may overwrite with `Generated{by:filter,...}`
  on the way back out from a user filter.

Snapshot regenerated: `crates/pampa/snapshots/json/yaml-tags.snap`
(1 file). The diff is correct-behavior: yaml-tagged-string spans
now share the YAML source range with their inner content (previously
the wrapper had `Original{0,0,0}`), so the writer's pool intern
coalesces three references that used to be three default entries.
No semantic regression — fewer dead pool entries, identical
inline-level source tracking.

Remaining pampa residue (6 sites in `readers/json.rs`): all
explicitly allowed by `provenance-contract.md` §10 (legacy Pandoc
JSON backward-compat). They will need `#[allow(deprecated)]` when
Phase 7's deprecation lands.

Workspace tests: 9737/9737 pass, 196 skipped.
…quarto-config,quarto-citeproc): Phase 6.5 residue cleanup workspace-wide

Plan 7f Phase 6.5 extension — apply explicit `By::*` kinds across
the remaining ~70 production sites the original plan didn't
enumerate. Every non-test `SourceInfo::default()` in the workspace
now has a deliberate provenance kind; the only retained sites are
the 5 contract-allowed legacy-Pandoc-JSON sites in
`crates/pampa/src/readers/json.rs`.

New `By::*` kinds (3) added in `crates/quarto-source-map/src/source_info.rs`:

- `By::jupyter_output()` — atomic. Synthesized blocks/inlines from
  kernel execution (Jupyter cell stdout / stderr, rich-display MIME
  bundles, error tracebacks). Regenerated on every re-run, so the
  preview's inline editor must not touch it.
- `By::callout()` — non-atomic. Wraps callout-decoration synthesis
  (default-title injection, screen-reader-only type announcement);
  the user's actual callout body stays editable through the preview.
  Atomicity decision per the worked example in
  `claude-notes/designs/provenance-contract.md` §3.
- `By::citeproc()` was added earlier in this phase and is reused
  here for `quarto-citeproc/src/output.rs:1274`.

Per-site application:

- `quarto-citeproc/src/output.rs:1274` (`empty_source_info()` helper)
  → `By::citeproc()`.
- `quarto-core/src/engine/context.rs:92` (`ExecutionContext::new`)
  → `By::unknown()`. Matching assertion in
  `engine_execution.rs:1378` updated.
- `quarto-core/src/engine/jupyter/{output.rs ×11, transform.rs ×1}`
  → `By::jupyter_output()`. Stream output, error tracebacks, MIME
  bundle conversion (text/plain, text/html, text/markdown,
  text/latex, image/* placeholders), and the inline-Code →
  Inline-Str expression-result swap in `transform.rs:279`.
- `quarto-core/src/transforms/callout_resolve.rs` (3 sites) →
  `By::callout()`. Default-title Str, screen-reader-only Span
  wrapper, both child source_infos.
- `quarto-core/src/transforms/shortcode_resolve.rs` —
  `config_value_to_inlines` (9 sites) + `lua_result_to_shortcode_result`
  (1 site) + `flatten_blocks_to_inlines` (1 inter-paragraph
  `Space` separator) reuse the surrounding `ConfigValue.source_info`
  or the shortcode token's source range so the canonical stamper
  pass downstream can wrap with the `Invocation` anchor. (The full
  enrichment chain — `Generated{by: shortcode, from: [Invocation]}`
  — happens at `stamp_block` / `stamp_inline`; this commit fixes
  the *innermost* synthesis sites.)
- `quarto-core/src/transforms/sidebar_auto.rs` (4),
  `categories_sidebar.rs` (3), `sidebar_render.rs` (2),
  `sidebar_generate.rs` (1), `page_nav_render.rs` (1),
  `navbar_render.rs` (1), `footer_render.rs` (1),
  `toc_render.rs` (1), `listing_render.rs` (1),
  `navigation_enrich.rs` (1) → `By::programmatic_config()`.
  All synthesizing config-storage of rendered-HTML strings or
  navigation items.
- `quarto-core/src/stage/stages/metadata_merge.rs` (4),
  `listing_item_info.rs` (2), `math_js.rs` (1) →
  `By::programmatic_config()`. Stage-processing intermediates
  where source bytes don't exist.
- `quarto-core/src/project/listing/feed/{stage.rs, complete.rs}`,
  `listing/post_render_upgrade/substitute.rs` — five diagnostic
  builders → `By::unknown()`. Span-less diagnostics degrade
  gracefully through the existing `with_location` formatter.
- `quarto-core/src/project/listing/config.rs:113`
  (`Listing::default().categories_source`) →
  `By::programmatic_config()`. Doc comment updated.
- `quarto-config/src/materialize.rs` (3 sites: `key_source`,
  `MergedValue::Map` source_info fallback, missing-path
  `ConfigValue::null`) → `By::programmatic_config()` /
  `By::unknown()` per site.
- `quarto-analysis/src/transforms/shortcode.rs` (7 sites) — reuse
  the shortcode token's source range; same pattern as the
  canonical `shortcode_resolve.rs` enrichment, in the simpler
  static-analysis form.
- `quarto-navigation/src/{page_nav,navbar,sidebar,footer,item}.rs`
  (16 sites) → `By::programmatic_config()`. Navigation items
  synthesized without YAML source context.
- `quarto-core/src/transforms/theorem.rs:312` doc-comment update
  (the actual fall-back recipe is `None`, not `SourceInfo::default()`,
  in the post-Phase-6.5 code).

Doc-comment-only references in
`shortcode_resolve.rs:172` and `navigation_href.rs:381` retained
as historical references — they describe pre-Phase-6.5 behavior.

Workspace tests: 9739/9739 pass, 196 skipped (3 new tests for the
new `By::*` kinds in `quarto-source-map`).
Updates CURRENT.md to reflect that the discovered production
residue (~70 unplanned sites) was addressed inline rather than
deferred to Phase 7. Three new `By::*` kinds were defined during
the sweep: `By::citeproc()`, `By::jupyter_output()`,
`By::callout()`. After this commit, only 6 production
`SourceInfo::default()` callers remain — 5 contract-allowed
legacy-Pandoc-JSON sites in `pampa/src/readers/json.rs` and the
`impl Default for SourceInfo` body itself.

Full `cargo xtask verify` passes all 12 steps including WASM/SPA.
Phase 7's compiler-driven audit now has a much smaller surface
to cover — most of the heavy lifting moved into Phase 6.5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant